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^ Abstract 

l_^ This work considers the problem of learning the structure of multivariate linear tree models, which 

^ include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables 

O such as linear- Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolu- 
tionary trees. The setting is one where we only have samples from certain observed variables in the tree, 

-vj and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are 

►^ connected to each other and to the observed variables). We propose the Spectral Recursive Grouping al- 

r/-\ gorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent 

QQ samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure 

fN| reveal certain natural dependencies on underlying statistical and structural properties of the underlying 

T-H joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the 

1^^. dimensionality of the observed variables, making the algorithm applicable to many high-dimensional set- 

("^ tings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a 

y—{ quartet of variables from second-order statistics. 



1 Introduction 



$H Graphical models are a central tool in modern machine learning applications, as they provide a natural 



methodology for succinctly representing high-dimensional distributions. As such, they have enjoyed much 
success in various AI and machine learning applications such as natural language processing, speech recog- 
nition, robotics, computer vision, and bioinformatics. 

The main statistical challenges associated with graphical models include estimation and inference. While 
the body of techniques for probabilistic inference in graphical models is rather rich j29j. current methods 
for tackling the more challenging problems of parameter and structure estimation are less developed and 
understood, especially in the presence of latent (hidden) variables. The problem of parameter estimation 
involves determining the model parameters from samples of certain observed variables. Here, the predominant 
approach is the expectation maximization (EM) algorithm, and only rather recently is the understanding 
of this algorithm improving |101 [S] . The problem of structure learning is to estimate the underlying graph 
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Figure 1: The four possible (undirected) tree topologies over leaves {zi, Z2,Z3,Z4}. 



of the graphical model. In general, structure learning is NP-hard and becomes even more challenging when 
some variables are unobserved |^. The main approaches for structure estimation are either greedy or local 
search approaches PI 115] or, more recently, based on convex relaxation [25_. 

This work focuses on learning the structure of multivariate latent tree graphical models. Here, the 
underlying graph is a directed tree {e.g., hidden Markov model, binary evolutionary tree), and only samples 
from a set of (multivariate) observed variables (the leaves of the tree) are available for learning the structure. 
Latent tree graphical models are relevant in many applications, ranging from computer vision, where one 
may learn object/scene structure from the co-occurrences of objects to aid image understanding fT"; to 
phylogenetics, where the central task is to reconstruct the tree of life from the genetic material of surviving 
species [12] • 

Generally speaking, methods for learning latent tree structure exploit structural properties afforded by 
the tree that are revealed through certain statistical tests over every choice of four variables in the tree. These 
quartet tests, which have origins in structural equation modeling |301 [3] , are hypothesis tests of the relative 
configuration of four (possibly non-adjacent) nodes/variables in the tree (see FigurefTl); they are also related to 
the four point condition associated with a corresponding additive tree metric induced by the distribution [4 . 
Some early methods for learning tree structure are based on the use of exact correlation statistics or distance 
measurements {e.g., [231 US]). Unfortunately, these methods ignore the crucial aspect of estimation error, 
which ultimately governs their sample complexity. Indeed, this (lack of) robustness to estimation error has 
been quantified for various algorithms (notably, for the popular Neighbor Joining algorithm [lUITU]), and 
therefore serves as a basis for comparing different methods. Subsequent work in the area of mathematical 
phylogenetics has focused on the sample complexity of evolutionary tree reconstruction [TS] (Ml [2Ql [11] . The 
basic model there corresponds to a directed tree over discrete random variables, and much of the recent effort 
deals exclusively in the regime for a certain model parameter (the Kesten-Stigum regime [IH]) that allows for a 
sample complexity that is polylogarithmic in the number of leaves, as opposed to polynomial |201lllj . Finally, 
recent work in machine learning has developed structure learning methods for latent tree graphical models 
that extend beyond the discrete distributions of evolutionary trees [8] , thereby widening their applicability 
to other problem domains. 

This work extends beyond previous studies, which have focused on latent tree models with either discrete 
or scalar Gaussian variables, by directly addressing the multivariate setting where hidden and observed nodes 
may be random vectors rather than scalars. The generality of our techniques allows us to handle a much 
wider class of distributions than before, both in terms of the conditional independence properties imposed by 
the models {i.e., the random vector associated with a node need not follow a distribution that corresponds 
to a tree model), as well as other characteristics of the node distributions {e.g., some nodes in the tree could 
have discrete state spaces and others continuous, as in a Gaussian mixture model). 

We propose the Spectral Recursive Grouping algorithm for learning multivariate latent tree structure. 
The algorithm has at its core a multivariate spectral quartet test, which extends the classical quartet tests for 
scalar variables by applying spectral techniques from multivariate statistics (specifically canonical correlation 
analysis [2 [22]). Spectral methods have enjoyed recent success in the context of parameter estimation [2TJ 
UM \Tf[ [55] ; our work shows that they are also useful for structure learning. We use the spectral quartet test 
in a simple modification of the recursive grouping algorithm of [5] to perform the tree reconstruction. The 
algorithm is essentially a robust method for reasoning about the results of quartet tests (viewed simply as 
hypothesis tests); the tests either confirm or reject hypotheses about the relative topology over quartets of 



variables. By carefully choosing which tests to consider and properly interpreting their results, the algorithm 
is able to recover the correct latent tree structure (with high probability) in a provably efficient manner, 
in terms of both computational and sample complexity. The recursive grouping procedure is similar to the 
short quartet method from phylogenetics [13] , which also guarantees efficient reconstruction in the context of 
evolutionary trees. However, our method and analysis applies to considerably more general high-dimensional 
settings; for instance, our sample complexity bound is given in terms of natural correlation conditions that 
generalize the more restrictive effective depth conditions of previous works [HI E] . Finally, we note that while 
we do not directly address the question of parameter estimation, provable parameter estimation methods 
may derived using the spectral techniques from [STJ [16] . 

2 Preliminaries 

2.1 Latent variable tree models 

Let T be a connected, directed tree graphical model with leaves Vobs := {2:1, a;2, • . • ,Xn} and internal nodes 
"l^hid '■— {hi, /i2j • ■ • 5 hjn} such that every node has at most one parent. The leaves are termed the observed 
variables and the internal nodes hidden variables. Note that all nodes in this work generally correspond to 
multivariate random vectors; we will abuse terminology and still refer to these random vectors as random 
variables. For any h e Vhid, let ChildrenT(/i) C Vt denote the children of h in T. 

Each observed variable x € Vobs is modeled as random vector in M'', and each hidden variable h G Vhid 
as a random vector in M . The joint distribution over all the variables Vt := Vobs U Vhid is assumed satisfy 
conditional independence properties specified by the tree structure over the variables. Specifically, for any 
disjoint subsets Vi, V2, V3 C Vt such that V3 separates Vi from V2 in T, the variables in Vi are conditionally 
independent of those in V2 given V3. 

2.2 Structural and distributional assumptions 

The class of models considered are specified by the following structural and distributional assumptions. 

Condition 1 (Linear conditional means). Fix any hidden variable h E Vhid- For each hidden child g E 
ChildrenT(ft.) D Vhid, there exists a matrix ^(g|h) G M.''^'^ such that 

E[g\h] = A^glh)h; 

and for each observed child x E ChildrenT(/i) n Vobs, there exists a matrix C(^x\h) G R'^^'' such that 

E[x\h] = C(,^\h)h. 

We refer to the class of tree graphical models satisfying Condition [T] as linear tree models. Such models 
include a variety of continuous and discrete tree distributions (as well as hybrid combinations of the two, 
such as Gaussian mixture models) which are widely used in practice. Continuous linear tree models include 
linear-Gaussian models and Kalman filters. In the discrete case, suppose that the observed variables take 
on d values, and hidden variables take k values. Then, each variable is represented by a binary vector in 
{0,1}*, where s = d for the observed variables and s = k for the hidden variables (in particular, if the 
variable takes value i, then the corresponding vector is the i-th coordinate vector), and any conditional 
distribution between the variables is represented by a linear relationship. Thus, discrete linear tree models 
include discrete hidden Markov models [16] and Markovian evolutionary trees [21] . 

In addition to the linearity, the following conditions are assumed in order to recover the hidden tree 
structure. For any matrix M, let crj(M) denote its t-th largest singular value. 

Condition 2 (Rank condition). The variables in Vt — Vhid U Vobs obey the following rank conditions. 

1. For all h E Vhid, E[hh'^] has rank k (i.e., <Tk{E[hh'^]) > 0). 



Figure 2: Set of trees Fh^ — {7i, 72,73} obtained if /i4 is removed. 

2. For all h e Vhid and hidden child g e Childrenx(/i) n Vhid, -^(g\h) has rank k. 

3. For all h e Vhid and observed child x e ChildrenT(/i) Vobsj Gix\h) has rank fc. 

The rank condition is a generalization of parameter identifiability conditions in latent variable models tl] 
1211 116| which rules out various (provably) hard instances in discrete variable settings [21 . 

Condition 3 (Non-redundancy condition). Each hidden variable has at least three neighbors. Furthermore, 
there exists p^ax > such that for each pair of distinct hidden variables h,g £ Vh 



id. 



det(E[/i5T])2 ^ 

— r'max 



det(E[/i/iT])det(E[ggT]) 

The requirement for each hidden node to have three neighbors is natural; otherwise, the hidden node 
can be eliminated. The quantity /Omax is a natural multivariate generalization of correlation. First, note 
that pmax ^ 1, and that if Pmax = 1 is achieved with some h and g, then h and g are completely correlated, 
implying the existence of a deterministic map between hidden nodes h and g; hence simply merging the two 
nodes into a single node h (or g) resolves this issue. Therefore the non-redundancy condition simply means 
that any two hidden nodes h and g cannot be further reduced to a single node. Clearly, this condition is 
necessary for the goal of identifying the correct tree structure, and it is satisfied as soon as h and g have 
limited correlation in just a single direction. Previous works [Ml US] show that an analogous condition 
ensures identifiability for general latent tree models (and in fact, the conditions are identical in the Gaussian 
case). Condition [3] is therefore a generalization of this condition suitable for the multivariate setting. 

Our learning guarantees also require a correlation condition that generalize the explicit depth conditions 
considered in the phylogenetics literature [111 [21] . To state this condition, first define J-h to be the set of 
subtrees of that remain after a hidden variable h g Vhid is removed from T (see Figure [2]). Also, for any 
subtree T' of T, let Vobs[T'] C Vobs be the observed variables in T' ■ 

Condition 4 (Correlation condition). There exists 7inin > such that for all hidden variables h € Vhid and 
all triples of subtrees {7i, 72, 73} C Tj^ in the forest obtained if h is removed from T, 

max min <JkCE,\xiX^ 1) > 7inin- 

a;ieVob3[ri],x2eVob=[r2],2;3eVob=[r3] {i,i}c{i,2,3} ■' 

The quantity 7inin is related to the effective depth of T, which is the maximum graph distance between 
a hidden variable and its closest observed variable [lH [8] . The effective depth is at most logarithmic in the 
number of variables (as achieved by a complete binary tree), though it can also be a constant if every hidden 
variable is close to an observed variable [e.g., in a hidden Markov model, the effective depth is 1, even though 
the true depth, or diameter, is m+l). If the matrices giving the (conditionally) linear relationship between 
neighboring variables in T are all well-conditioned, then 7inin is at worst exponentially small in the effective 
depth, and therefore at worst polynomially small in the number of variables. 



Algorithm 1 SpectralQuartetTest on observed variables {zi, 2:2,23, 24}. 



Input: For each pair {i,j} C {1,2,3,4}, an empirical estimate Ei,j of the second-moment matrix E[ziZ^] 

and a corresponding confidence parameter Aij > 0. 
Output: Either a pairing {{zi, Zj}, {zi', Zj/}} or _L. 
1; if there exists a partition of {zi, Z2, Z3, Z4} = {zi, Zj} U {z^/, Zj>} such that 

k k 

s=l s=l 

then return the pairing {{zt, Zj}, {zi' , Zji}}. 
2; else return _L. 



Finally, also define 



7max := max {cri(E[xixJ])} 

{xi,K2}CVobs 



to be the largest spectral norm of any second-moment matrix between observed variables. Note 7max < 1 
in the discrete case, and, in the continuous case, 7max < 1 if each observed random vector is in isotropic 
position. 

In this work, the Euclidean norm of a vector x is denoted by ||a:||, and the (induced) spectral norm of a 
matrix A is denoted by \\A\\, i.e., \\A\\ :— <ti{A) = sup{||^a:|| : ||.t|| = 1}. 

3 Spectral quartet tests 

This section describes the core of our learning algorithm, a spectral quartet test that determines topology 
of the subtree induced by four observed variables {zi, Z2, Z3, Z4}. There are four possibilities for the induced 
subtree, as shown in Figure [T] Our quartet test either returns the correct induced subtree among possibilities 
in Figure fTla)-(c); or it outputs _L to indicate abstinence. If the test returns _L, then no guarantees are 
provided on the induced subtree topology. If it does return a subtree, then the output is guaranteed to be 
the correct induced subtree (with high probability). 

The quartet test proposed is described in AlgorithmlT] (SpectralQuartetTest). The notation [a]^ denotes 
max{0, a} and [t] (for an integer t) denotes the set {1,2,..., t}. 

The quartet test is defined with respect to four observed variables Z :— {zi, Z2, Z3, Z4}. For each pair of 
variables Zi and Zj, it takes as input an empirical estimate Eij of the second-moment matrix E[ziZj'^], and 
confidence bound parameters A^^-,- which are functions of N, the number of samples used to compute the 
Xij's, a confidence parameter S, and of properties of the distributions of Zi and Zj. In practice, one uses 
a single threshold A for all pairs, which is tuned by the algorithm. Our theoretical analysis also applies to 
this case. The output of the test is either _L or a pairing of the variables {{z^, Zj}, {z^', Zj/}}. For example, 
if the output is the pairing is {{zi, Z2}, {Z3, Z4}}, then Figurema) is the output topology. 

Even though the configuration in Figure flld) is a possibility, the spectral quartet test never returns 
{{zi, Z2, Z3, Z4}}, as there is no correct pairing of Z. The topology {{zi, Z2, Z3, Z4}} can be viewed as a 
degenerate case of {{zi, Z2}, {Z3, Z4}} (say) where the hidden variables h and g are deterministically identical, 
and Condition [3] fails to hold with respect to h and g. 

3.1 Properties of the spectral quartet test 

With exact second moments: The spectral quartet test is motivated by the following lemma, which 
shows the relationship between the singular values of second- moment matrices of the z^'s and the induced 
topology among them in the latent tree. Let detfc(M) :— Y[s=i '^s{M) denote the product of the k largest 
singular values of a matrix M. 



Lemma 1 (Perfect quartet test). Suppose that the observed variables Z ~ {zi, Z2, z^, z^} have the true 
induced tree topology shown in Figure\Wa), and the tree model satisfies Conditionln and Condition^ Then 

detfc(lE[ziZ3^])detfc(E[z24]) ^ detfc(E[ziZ4^])dctfc(E[z24]) ^ dci{E[hg'^]f ^ ^ 

dctfe(E[zizJ])detfe(E[z34]) detfe(E[zi4])detfc(E[z3zT]) det(E[/i/iT]) det(E[5.gT]) - ^' 

and detfe(E[zizJ])detfe(E[z22;|]) = detfe(E[zizJ])detfe(E[z2zJ]). 

This lemma shows that given the true second-moment matrices and assuming Condition [31 the in- 
equahty in ^ becomes strict and thus can be used to deduce the correct topology: the correct pairing 
is {{zi, Zj}, {zii , Zj'}} if and only if 

detfe(E[z,z7])detfc(E[z,.ZjT]) > detfc(E[z,,z7])detfe(E[z,z7]). 

Reliability: The next lemma shows that even if the singular values of E[ziz7] are not known exactly, then 
with valid confidence intervals (that contain these singular values) a robust test can be constructed which 
is reliable in the following sense: if it does not output _L, then the output topology is indeed the correct 
topology. 

Lemma 2 (Reliability). Consider the setup of Lemma\n and suppose that Figure\Wa) is the correct topology. 
If for all pairs {zi,Zj} C Z and all s € [k], (7s{2Jij) — Aij < <Ts(E[zizJ]) < as{Sij) + A^j-, and if 
SpectralQuartetTest returns a pairing {{z^, Zj}, {zii , Zj/}}, then {{zj, Zj}, {zi>, Zji}} = {{zi, Z2}, {Z3, z^}}. 

In other words, the spectral quartet test never returns an incorrect pairing as long as the singular values 
of E[ziZj'^] lie in an interval of length 2Aij around the singular values of Sij. The lemma below shows how 
to set the A^jS as a function of N, 6 and properties of the distributions of z^ and Zj so that this required 
event holds with probability at least 1 — 6. We remark that any valid confidence intervals may be used; 
the one described below is particularly suitable when the observed variables are high-dimensional random 
vectors. 

Lemma 3 (Confidence intervals). Let Z — {zi, Z2, Z3, Z4} be four random vectors. Let \\zi\\ < Mi almost 
surely, and let 5 £ (0, 1/6). // each empirical second-moment matrix Sij is computed using N iid copies of 
Zi and Zj , and if 

, _ E[||z.f||z,-|p]-tr(E[z.z7]E[z.z7]^) _i,rwo.J/,, 

'^^ - max{|lE[|lz,||2z.z7]||,||E[||z.|Pz,z7]|l}' '^ - l-^-^-t^^d,,/-^), 



/2max{||E[||zj||2z,z7]||,||E[||z,||2^,z7]||}t,,, M,M,t 



J "J J 



*'^ - V N ' 3iV 

then with probability 1 — (5, for all pairs {zi, Zj} d Z and all s £ [k], 

',.J J ^^l,J — ^ S\^^l'^t^j \J — ^ S\^^t.J / I ^^2, J 



as{S,,j) - A,j < a,(E[z,z7]) < a,(A,j) + ^^,J■ (2) 



Conditions for returning a correct pairing: The conditions under which SpectralQuartetTest returns 
an induced topology (as opposed to _L) are now provided. 

An important quantity in this analysis is the level of non-redundancy between the hidden variables h and 
g. Let 

2 ^ det(E[feg^])2 

^ • dct{E[hh'^])det{E[gg'T])' ^ ' 

If Figure n|a) is the correct induced topology among {zi, Z2, Z3, Z4}, then the smaller p is, the greater the gap 
between detfe(E[ziz7])deti,(E[z3z7]) andeither of detfc(E[ziz7])detfc(E[z2z7]) and dctfc(E[ziz7])detfe(E[z2z7]). 
Therefore, p also governs how small the A^j- need to be for the quartet test to return a correct pairing; this 
is quantified in Lemma l4J Note that Condition [3] implies p < pmax < 1- 



Lemma 4 (Correct pairing). Suppose that (i) the observed variables Z — {zi, Z2, 23, 24} have the true induced 
tree topology shown in Figure Uf aj; (ii) the tree model satisfies Condition fTl Condition pi and p < 1 (where 
p is defined in ^), and (Hi) the confidence bounds in ([2]) hold for all {i,j} and all s £ [k]. If 



1 r 1 

A,, < - . min^ I, 



rinll, - - 1} ■ mm{ak{E[z,zJ])} 
for each pair {i,i}, then SpectralQuartetTest returns the correct pairing {{zi, Z2}, {^3, 24}}- 



4 The Spectral Recursive Grouping algorithm 

The Spectral Recursive Grouping algorithm, presented as Algorithm [2J uses the spectral quartet test dis- 
cussed in the previous section to estimate the structure of a multivariate latent tree distribution from iid 
samples of the observed leaf variablesF] The algorithm is a modification of the recursive grouping (RG) 
procedure proposed in j8j. RG builds the tree in a bottom- up fashion, where the initial working set of 
variables are the observed variables. The variables in the working set always correspond to roots of disjoint 
subtrees of T discovered by the algorithm. (Note that because these subtrees are rooted, they naturally 
induce parent/child relationships, but these may differ from those implied by the edge directions in T.) In 
each iteration, the algorithm determines which variables in the working set to combine. If the variables are 
combined as siblings, then a new hidden variable is introduced as their parent and is added to the working 
set, and its children are removed. If the variables are combined as neighbors (parent/child), then the child 
is removed from the working set. The process repeats until the entire tree is constructed. 

Our modification of RG uses the spectral quartet tests from Section|3]to decide which subtree roots in the 
current working set to combine. Note that because the test may return _L (a null result), our algorithm uses 
the tests to rule out possible siblings or neighbors among variables in the working set — this is encapsulated 
in the subroutine Mergeable ( Algorithm Isl), which tests quartets of observed variables (leaves) in the subtrees 
rooted at working set variables. For any pair {u, w} C 7?, submitted to the subroutine (along with the current 
working set TZ and leaf sets C[-]): 

• Mergeable returns false if there is evidence (provided by a quartet test) that u and v should first be 
joined with different variables [u' and v' , respectively) before joining with each other; and 

• Mergeable returns true if no quartet test provides such evidence. 

The subroutine is also used by the subroutine Relationship (Algorithmic which determines whether a can- 
didate pair of variables should be merged as neighbors (parent/child) or as siblings: essentially, to check if 
M is a parent of v, it checks if v is a sibling of each child of u. The use of unreliable estimates of long-range 
correlations is avoided by only considering highly-correlated variables as candidate pairs to merge (where 
correlation is measured using observed variables in their corresponding subtrees as proxies). This leads to a 
sample-efficient algorithm for recovering the hidden tree structure. 

The Spectral Recursive Grouping algorithm enjoys the following guarantee. 

Theorem 1. Let rj £ (0,1). Assume the directed tree graphical model T over variables (random vectors) 
Vt = Vobs U Vhid satisfies Conditions [71 [^ [^ and CI Suppose the Spectral Recursive Grouping algorithm 
(Algorithm]^ is provided N independent samples from the distribution over Vobs? O'nd uses parameters given 
by 






N 3N 



(4) 



^To simplify notation, we assume that the estimated second-moment matrices Sx,y and threshold parameters Ax.y > for 
all pairs {x, y} C Vobs a-re globally defined. In particular, we assume the spectral quartet tests use these quantities. 



Algorithm 2 Spectral Recursive Grouping. 

Input: Empirical second-moment matrices S^^y for all pairs {x,y} C Vobs computed from A^ iid samples 

from the distribution over Vobs! threshold parameters Ax,y for all pairs {x,y} C Vobs- 
Output: Tree structure T or "failure". 
1; let TZ := Vobs, and for all x e 7?., T[x] := rooted single-node tree x and C[x] :— {x}. 
while \n\ > 1 do 

let pair {u,v} E {{u,v} C 7?. : Mergeab\e{TZ,£[-], 11,11) ~ true} be such that inax{(Tk{^x.y) ■ {x,y) € 
C[u\ X >C[w]} is maximized. If no such pair exists, then halt and return "failure". 
4; let result :— Relationship(7?., £[•], T[-], w, f ). 
if result = "siblings" then 

Create a new variable h, create subtree T[h] rooted at h by joining T[u\ and T[v] to h with edges 
{h,u} and {h,v}, and set C[h] := C[u] U C[v]. 
Add h to TZ, and remove u and v from TZ. 
else if result — "u is parent of v" then 

Modify subtree T[u\ by joining T[v] to u with an edge {u, v}, and modify C[u\ :— C[u\ U C[v]. 
Remove v from TZ. 
else if result = "v is parent of u" then 

{Analogous to above case.} 
end if 
end while 
Return f := T[h] where TZ ^ {h}. 



Algorithm 3 Subroutine Mergeable(7^,£[-],u, u). 



Input: Set of nodes 7?.; leaf sets C[v] for all v eTZ: distinct u,v eTZ. 
Output: true or false. 

1; if there exists distinct u' ,v' <E TZ \ {u,v} and {x,y,x' ,y') G C[u] x C[v] x C[u'] x C[v'] s.t. 
SpectralQuartetTest({a;,2/,a;',2/'}) returns {{x,x'},{y,y'}} or {{x,y'},{x' ,y}} then return false. 

2; else return true. 



where 



Bxi.xj := rnaxj E[||xi|| Xj-a; - ] , E[||xj|| XiX^ ] }, M^^ > ||a;i|| almost surely, 



n\wnx,r]-iT{n^^x]nx,xJ]) _., aj l^ 

''"•'"^- •" u,s.^{\\n\x,Vx,xJ]\\,\\n\x^Vx,x]]\\y ^-^^ .-41ni4d,.,,^.n/r,j. 

Let B := maXj;,,2:jeVob={-Sx.,a:j}; M := maXjj^gVob.l^a:.}, ^ := maXa;^^j;^.gVob.{*2;.,2:j}- -(f 

200-P -B -t 7-k-M^ -t 



N> 



( Tmin /-I „ n\ '^™'" ■ (\ - n ) 

• (1 - Pmax) „, y^ Pmax) 

\ 7max / 'max 



then with probability at least 1 — rj, the Spectral Recursive Grouping algorithm returns a tree T with the same 
undirected graph structure as T. 

Consistency is implied by the above theorem with an appropriate scaling of 77 with N. The theorem 
reveals that the sample complexity of the algorithm depends solely on intrinsic spectral properties of the 
distribution. Note that there is no explicit dependence on the dimensions of the observable variables, which 
makes the result applicable to high-dimensional settings. 



Algorithm 4 Subroutine Relationship(7?., £[-],T[-],u, w). 



Input: Set of nodes TZ; leaf sets C[v] for all v (zTZ; rooted subtrees T[u] for all v (zTZ: distinct u,v ^TZ. 
Output: "siblings", "u is parent of w" ("u -^ w"), or "w is parent of u" ("w -> u"). 

1; if u is a leaf then assert u -f^ v. 

2: if w is a leaf then assert v -f^ u. 

3; let 'R,\w\ := (7^ \ {w}) U {w' : w' is a child of w in T[w]} for each w £ {u, v}. 

4; if there exists child wi of m in T[m] s.t. Mergeable(7^[u],£[-],-ui, w) =false then assert "mt^w". 

5; if there exists child vi of v in T[v] s.t. Mergeable(7?.[w],£[-],M, vi) = false then assert "w 7^ u". 

6; if both "u 7^ w" and "w 7^ u" were asserted then return "siblings" . 

7; else if "u 7^ w" was asserted then return "u is parent of u" ( "w ^ u" ) . 

8; else return "u is parent of v" ("u —J- w"). 
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A Sample-based confidence intervals for singular values 

We show hovi' to derive confidence bounds for the singular values of Si,j := E[zizJ] for {i,j} C {1,2,3,4} 
from N iid copies of the random vectors {zi, Z2, Z3, Z4}. That is, we show how to set Aij so that, with high 
probability, 

for all {i,j} and all s G [k]. 

We state exponential tail inequalities for the spectral norm of the estimation error Sij — Sij. The first 
exponential tail inequality is stated for general random vectors under Bernstein- type conditions, and the 
second is specific to random vectors in the discrete setting. 

Lemma 5. Let Zi and Zj be random vectors such that \\zi\\ < Mi and \\zj\\ < Mj almost surely, and let 

E[\\z,\\^\\zj\\^]-tTiS,jSj,) 
^*J •= riligni II? Till llnrni' 112 Tiin - niax{dim(z,),dim(zj)}. 



max 



{||E[||z,|Pz,zT]||jE[|,^^|,2^^. Tjlli 



Let Si,j :~ E[zizJ] and let Sij be the empirical average of N independent copies of ZizJ . Pick any t > 0. 
With probability at least 1 — Adijt{e* — t — 1)~^ , 






< 



2max{\\E[\\zj\\^z,zJ]\\,\\E[\\z,\\^ZjzJ]\\}t MiMjt 



N 37V 

10 



Remark 1. For any S £ (0, 1/6), we have 'idijt{e* — i — 1) ^ <5 provided that t > 1.55\n{4:dij/5). 



Proof. Define the random matrix 



Z 



'-j-^i 



Let Zi, . . . , Zjv be independent copies of Z . Then 



Pr 



^i,j '^i;j 



>t 



Pr 



N 



N 



J2Ze~E[Z] 



>t 



Note that 



so by convexity, 



E[Z^] =E 



\E[Z^] -E[Z]^\\ < \\E[Z^ 



\zi\?zjzj 



<max{||E[||z,||2z,z7]||,||E[||z,fz,z7]||} 



and 



tr(E[Z2] - E[Zf) = tv{E[\\z,fz,zJ]) + tr(E[||z,||2^,^7]) - iv{S,^,Sl,) - tr{Sl^S,,,) 



= 2 E[||z,||^||z,||^]-tr(i;,,,i:,;^. 



Moreover, 

\\Z\\<\\zMz,\\<M,M,. 

By the matrix Bernstein inequahty |17) . for any i > 0, 



Pr 



^ij i-j 



> 



2{max{\\E[\\z,\\^ZizJ]l\\E[\\zi\\^ZjzJ]\\})t M,M,t 



N 



3iV 



< 2 



(E[\\z,r\\z,r]-triS.,,,Sl)) 

> „ -^ „ i.„, • i(e* - t - l)-i 



max{||E[||z,||2z,z7]||,||E[||z,Pz,-z7]||} 



(e* - t - 1)-^ = 4:d,jt{e* -t-iy 



The claim foUows. 



n 



In the case of discrete random variables (modeled as random vectors as described in Section [2]), the 
following lemma from ^16j can give a tighter exponential tail inequality. 

Lemma 6 f |16p. Let Zi and Zj be random vectors, each with support on the vertices of a probability simplex. 
Let Eij := E[zizJ] and let Sij be the empirical average of N independent copies of ZizJ . Pick any i > 0. 
With probability at least 1 — e~*, 



y. . ^ y. . 



< 



y. . — y. . 



^ l + Vt 



(where \\A\\f denotes the Frobenius norm of a matrix A). 

For simplicity, we only work with Lemma [sl although it is easy to translate all of our results by changing 
the tail inequality. The proof of Lemma |3] is immediate from combining Lemma [5] and Weyl's Theorem. 
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Lemma l3] provides some guidelines on how to set the A^j as functions of N, 5, and properties of Zi and 
Zj. The dependence on the properties of Zi and Zj comes through the quantities Mi, Mj, dij, and 

B,,, :=max{||E[||z,|l2z,z7]||,||E[||zj2^,,z^]||}, 

In practice, one may use plug-in estimates for these quantities, or use loose upper bounds based on weaker 
knowledge of the distribution. For instance, dij is at most inax{dim(zi), dim(zj)}, the larger of the explicit 
vector dimensions of Zi and Zj . Also, if the maximum directional standard deviation a^ of any Zi is known, 
then Bij < ina.x{Mf, M^}(t'^. We note that as these are additive confidence intervals, some dependence on 
the properties of Zi and Zj is inevitable. 

B Analysis of the spectral quartet test 

For any hidden variable h e Vhid, let DescendantST(/i) ^ Vt be the descendants of h in T. For any 
g G DescendantsT(/i) H Vhid such that the (directed) path from ft, to 5 is /i — >■ 51 — > 52 — > ■ • • — > Pq = g, define 
A(^g\h) G M''^'' to be the product 

Aalh) ■= ^(g,|s,-i) • • • A92\9i)Aai\h)- 

Similarly, for any x G DescendantST(/i) H Vobs such that the (directed) path from /i to a; is /i — > gi — > (72 ~^ 
•••—>■ g, —>■ a;, define C(^|/i) G R'^^'^ to be the product 

B.l logdetfc metric 

Define the function /u : Vt x Vt — > M by 

logdetfe(E[Mu^]-i/2E[y„T]E[^^T]-i/2-) if y^„ <= Vhid 

(u v) ■= ( logdetfe(E[ut;T]E[wi;T]-i/2) if „ g Vobs, v G Vmd 

^^""'"^ ■ ^ logdetfc(E[uuT]-V2E[w«T]) if u G Vhid, V G Vobs ■ 

log det fc (E [uu^ ] ) if u , w G Vobs 

Proposition 1 (logdet^ metric). Assume Conditionsluandl^hold, and pick anyu,v G Vt- Ifw G Vt\{u,v} 
is on the (undirected) path u -^ v, then ii{u, v) = fi(u, w) + l-t{w, v). 

Proof. Suppose the induced topology over u, v, w in T is the following. 



Assume for now that u,v E Vhid- Then, using Condition [T] 

E[™^] == E[mw^]A[„|^) = (E[uw^]E[ww^]-i/2)(E[^^Tpi/2gj^^Tj^ 

so, because vm)k{E[uu^]-^/'^¥.[tiu}^]¥.[wu^]-'^/'^) = va.x)k{¥.[wu^]-^/'^¥.[wv'^]¥.[vv'^]-^/'^) = k hy Condi- 
tion [2) 

/i(u,w) =logdetfe(E[wu^]-i/2E[uw^]E[wu;^]-i/2E[^^T]-i/2]gj^^T]£j^^T]-i/2) 
= ijl{u,w) + h{w,v). 
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If u e Vhid but V e Vobs, then let C/„ G W^'^*^ be a matrix of orthonormal left singular vectors of C(^,|^) . Then 
E[w^] = (E[Mw^]E[ww^]-i/2)(]E[y^y^T]-i/2]gjy^^T]-) g^g before, and 

detfe(E[Mu^]-i/2E[y^T]-) ^ |det(E[uM^]-i/2E[uu^]C/„)| 

= I det(E[uM^]"i/2)| • I det(E[TO^]C/^)| 

= detfc(E[uM^]-i/2E[uu;^]E[wu;^]-i/2) • deik{^[ww'^]-^/^V.[wv'^]U^) 

= detfc(E[uM^]^i/2E[uu;^]E[ww^]-i/2) • detk{E[ww'^]^^^^E[wv'^]), 

so 

Ai(u,w) = logdetfe(E[uu^]^i/2E[y^Tj]£j^^Tj-i/2^ _l_ jQgjg^^^gj^^Tj-i/2g^^^Tj>) ^ ^(w,u,) +^(w,u). 

Suppose now that the induced toplogy over m, v, w in T is the following. 

u\ — (w) — /v 



Again, first assume that u,v G Vhid- Then, by Condition [Tl 

so ^(w, ii) = /i(u, w) + /i(w, w) as before. The cases where one or both of u and v is in Vobs follow by similar 
arguments as above. D 

B.2 Proof of Lemma [l] 

By Proposition [l] 

detfe(E[zizJ]) •detfc(E[z2zJ]) = exp(^(zi,Z3) +^(22,2:4)) 

= exp(^(zi,/i) + fi(h,g) +^(.9,23) + fi{z2,h) + ^i{h,g) +^(3,24)) 
= exp{fi.{zi,h) + fi(h,g) +^(9,24) + ij,{z2,h) + ij,{h,g) +/x(g,Z3)) 

= exp(^(zi,2:4) + ^(Z2,Z3)) 

= detfe(E[zizJ])-detfc(E[z2Z3'"]). 
Moreover, 

detfc(E[zizJ]) ■ detfc(E[z2^J]) _ exp(//(zi,Z3) + /^(z2,^^4)) 
dctfc(E[zizJ]) •detfc(E[z3zJ]) exp(/i(zi, Z2) +^(^3,24)) 

^ cw{Kzi,h) + ij{h,g) + fi{g,Z3) + ii{z2,h) + fi{h,g) + fi{g,Z4)) 
exp(^(zi, h) + fi{h, Z2) + fi{z3, g) + /i(g, Z4)) 

= exp(2^(/i,g)) 

= dci{E[hh^]~^/^E[hg'^]E[gg'^]-^''^f 

det(E[/i.g^])2 
~ det(E[/i/iT]).det(E[ggT])- 

Finally, note that u^E[hh^]^^^'^E[hg"^]E[gg^]^'^/'^v < ||u||||w|| for all vectors u and v by Cauchy-Schwarz, so 

as required. D 

Note that if Condition [3] also holds, then Lemma [T] implies the strict inequalities 

max{detfe(E[zizJ]) • detfe(E[z2z|]), detfc(E[zizJ]) • detfc(E[z2zJ])} < detfe(E[zizJ]) • detfc(E[z3zJ]). 
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B.3 Proof of Lemma [2] 

Given that pi holds for all pairs {i,j} and all s £ {1,2, ... , k}, if the spectral quartet test returns a pairing 
{{zi, Zj}, {zi> , Zji}}, it must be that 

k k 

s=l s=l 

k k 

> J|(a,(A,,j) + A,,,j)(f7,(^,,,0 + A,jO > YlasiElz^'zJjjasiElz^zJ,]). 

s=l s=l 

Therefore 

k 

detk{E[z,zJ]) ■ detk(E[z,,zJ,]) = [] a,(E[z,zJ])a,{E[z,,zJ,]) 

k 

> Y[ as{E[z,,zJ])as{E[z,zJ,]) = detfc(E[z,,z7]) • detfe(E[z,z7]). 

s=l 

But by Lcmniafll the above inequality can only hold if {{zi, Zj}, {zii , Zj'}} = {{21, 2:2}, {23, 24}}. D 

B.4 Proof of Lemma [4] 

Let Si^j :— E[zizJ]. The assumptions in the statement of the lemma imply 

max{Ai2, A34} < --min{crfc(Z'i. 2), (7^.(^:3,4)} 
8k 

where eq '■— iTiin \ li If- Therefore 

k k 

\{[Os{Sl,2) - Ai,2]+[C7,(^3,4) - A3,4]+ > n[f^-(^1.2) " 2Ai^2]+Ws{Ssa) " 2A3,4] + 
s=l s=l 

> I fl a,iSi^2)aA^3A) j (1 - eo/2). (5) 

If E[/ig^] has rank k, then so do T^j for i € {1,2} and j G {3,4}. Therefore, for {i',j'} = {l,2,3,4}\{i,j}, 



8k 



max{Aij,Ai'j,} < --- mm{ak{Si',j'),(7k{I^^',j')}■ 



^his implies 

k 



Y[{as{S,^j) + A,j)ia,iSr,j') + A,,,,,) < J|(a,(i;,,,) + 2A,j)(a,(i:,-,,0 + 2A,,jO 
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Therefore, combining ([5]), ([6]), and Lemma fTl 

]^[(7,(A,2) - Ai^2]+[fT.(^3,4) - A34] + 



^ l-eo/2 det(E[fafa^])det(ELgg^]) i-r, .^ \^A Vrr(y ) ^ /\ ) 



s=l 
k 



^ 1 det(E[M^])det(E[gg^]) i-r, .^ )^A \(.(y \^A ^ 



(l + eo)2 det(E[/igT])2 

k 

5-1 

SO the spectral quartet test will return the correct pairing {{zi, Z2}, {23, 24}}, proving the lemma. D 

B.5 Conditions for returning a correct pairing when rank(E[/i5f ]) < k 

The spectral quartet test is also useful in the case where E[/ig^] has rank r < k. In this case, the widths of 
the confidence intervals are allowed to be wider than in the case where rank(E[/ig^]) = k. Define 



CTmin :=niin(^{crfc(Z'i.2),CTfe(Z'3,4)}U{crr(^j,j): «e {l,2},j G {3,4}} 

nLl ^^(^1.2)^5(1^3,4) 

Instead of depending on mmij{ak{Sij)} and p as in the case where rank(E[/ig^]) = k, we only depend on 
cTinin and pi. 

Lemma 7 (Correct pairing, rank r < k). Suppose that (i) the observed variables Z — {zi, Z2, z^, z^} have 
the true induced (undirected) topology shown in Figure Uja), (ii) the tree model satisfies Condition \n and 
Condition]^ (Hi) E[hg^] has rank r < k, and (iv) the confidence bounds in ([2| hold for all {i,j} and all 

se[k]. If 

1 . f_ ..f± ' ' 

.2pi, 



Aij < -^ ■ min •( 1, 8fc 



for each {i,j}, then Alg or ithm\^ returns the correct pairing {{zi, Z2}, {-23, •24}}. 
Note that the allowed width increases (to a point) as the rank r decreases. 
Proof. The assumptions in the statement of the lemma imply 

ClCmin 



where 



We have 



max{A,_j- : {i,j} C [4]} < 



ei := min < 8fc • ( - — j , 1 



k f ^ \ 

J] [(7,(^1,2) - Ai,2]+[a,(^3,4) - A3,4]+ > n -^s (^1,2)^.(^3,4) (1 - ei/2) 

s=l \s=l / 
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as in the proof of Lemma [4] Moreover, 

fe 

5-1 



eiCmi 



nun 



2(k-r) 



t k 



pI /I , n /elC^mi"^2(''-'■) 



<(n-^(^i,>^(^3,4) •(^^-;ji(^-(i+o 



^s=l 
k 



n^s (^1,2)^1.(^:3,4) •/'?-(i+^i)-(§) 



£^s 2(fe-r) 



\s=l 

< m[a,(A.2)-Ai,2]+[a,(^3.4)-A3,4]+j •'°?-(l + ^i)'-(^)'^'"''^ 

< [][a,(^i,2) - Ai,2] + [C7,(Z'3,4) - A3,4] + . 
s=l 

Therefore the spectral quartet test will return the correct pairing {{zi, Z2}, {2:3, ^4}}; the lemma follows. D 

C Analysis of Spectral Recursive Grouping 

C.l Overview 

Here is an outline of the argument for Theorem [Tl 

1. First, we condition on a 1 — ry probability event over the iid samples from the distribution over Vobs in 
which the empirical second-moment matrices are sufficiently close to the true second-moment matrices 
in by spectral norm (Equation pi) . This is required to reason deterministically about the behavior of 
the algorithm. 

2. Next, we characterize the pairs {u,w} C TZ (where TZ are the roots of subtrees maintained by the 



algorithm) that cause the Mergeable subroutine to return true. (Lemma 11), as well as those that 



cause it to return false (Lemma 12 1 



3. We use the above characterizations to show that the main while-loop of the algorithm maintains loop 
invariants such that when the loop finally terminates, the entire tree structure will have been completely 



discovered (Lemma 13 1. This is achieved by showing each iteration of the while-loop 

(a) selects a "Mergeable" pair {u,v} C TZ that satisfies certain properties (Claim[2]and Claimlsl) such 
that, if they are properly combined (as siblings or parent/child), the required loop invariants will 
be perserved; and 

(b) uses the Relationship subroutine to correctly determine whether the chosen pair {u, v} should be 
combined as siblings or parent/child (Claim l4|. 

C.2 Proof of Theorem [D 

Recall the definitions of ^(g|/i) G M'^^'^ and C(a;|h) G W^'^^ for descendants g E DcsccndantST(ft-) H Vhid and 
X G DescendantST(^) H C(x\h) in T, as given in Appcndixpl 
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Let us define 



• , 1 1 i\ 7min/7. 

emin := mm < 1, 1 ^ , 



Pniax J ^A^ -r 'Tmin/Tniax 

1 + £ 7max 

The sample size requirement ensures that 

8k - 

This implies conditions on the thresholds A^;. .j;. in LemmaE^for the spectral quartet test on {xi,X2,X3,X4} 
to return a correct pairing, provided that 

minK(i;,,,,J : {t,j} C {1,2,3,4}} > ^. (7) 

The probabilistic event we need is that in which the confidence bounds from Lemma [5] hold for each pair 
of observed variables. The event 

y{Xi,Xj} C Vobs ■ \\Sx,,Xj - Sxi,x,\\ < ^Xi,Xj, (8) 

occurs with probability at least 1 — 77 by Lemma [5] and a union bound. We henceforth condition on the above 
event. 

The following is an immediate consequence of Weyl's Theorem and conditioning on the above event. 

Lemma 8. Fix any pair {x,y} C Vobs- If ^^ki^x.y) > (1 + 2)^7 then (7k{^x,y) > ^- If <^k{^x,y) > (^ , then 
(Jk{S.,^y)>{l-e)e. 

Before continuing, we need some definitions and notation. First, we refer to the variables in Vt inter- 
changeably as both nodes and variables. Next, we generally ignore the direction of edges in T, except when 



it becomes crucial (namely, in Lemma 10). For a node r in T, we say that a subtree T[r\ of T (ignoring edge 
directions) is rooted at r if T[r] contains r, and for every node u in T[r] and any node v not in T[r], the 
(undirected) path from u to 1; in T passes through r. Note that a rooted subtree naturally imply parent/child 
relationships between its constituent nodes, and it is in this sense we use the terms "parent", "child", "sib- 
ling" , etc. throughout the analysis, rather than in the sense given by the edge directions in T (the exception 



is in Lemma 10). A collection C of disjoint rooted subtrees of T naturally gives rise to a super-tree ST[C\ 
by starting with T and then collapsing each T\r] g C into a single node. Note that each node in ST[C] is 
either associated with a subtree in C, or is a node in T that doesn't appear in any subtree in C. We say 
a subtree T G C is a leaf component relative to C if it is a leaf in this super-tree ST[C]. Finally, define 
Vhid[C] := {h E Vhid : h does not appear in any subtree in C}. 

The following lemma is a simple fact about the super-tree given properties on the subtrees (which will 
be maintained by the algorithm). 

Lemma 9 (Super-tree property). Let TZ C Vt. Let C := {7~[w] : u G TZ} be a collection of disjoint rooted 
subtrees, with u being the root ofT[u\, such that their leaf sets {C[u\ : u G TZ\ partition Vobs- Then the nodes 
of the super-tree ST[C\ are C U VhidP], and the leaves of ST[C] are all in C. 

Proof. This follows because each leaf in T appears in the leaf set of some T[u] . D 

The next lemma relates the correlation between two observed variables in a quartet (on opposite sides of 
the bottleneck) to the correlations of the other pairs crossing the bottleneck. 

Lemma 10 (Correlation transfer). Consider the following induced (undirected) topology over {zi, 22, 2:3, Z4} C 
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The 



CTfe (E[zi^ 



> 



<7,{E[Z2ZJ]) 



Proof. In this proof, the edge directions and the notion of ancestor are determined according to the edge 
directions in T. Let r be the least common ancestor of {zi, Z2, z^, Z4} in T. There are effectively three 
possible cases to consider, depending on the location of r relative to the Zi, h, and g; we may exploit the 
fact that crfc(E[zi2;J]) = (Jk{E[z4^zJ]) to cover the remaining cases. 

1. Suppose r appears between h and Zi. 





By Condition [2] we can choose matrices C/i , t/2 , t^3 , C4 £ M'^^'^ such that the columns of Ui are an 
orthonormal basis of range(C(2j^|r)), the columns of U2 are an orthonormal basis of iaiige{C(^z2\h))i the 
columns of U3 are an orthonormal basis of range(C(23|g)), ^^^ ^hc columns of U^ are an orthonormal 
basis of range(C(2^|g)). We have 



U^E[zizJ]U4, 



^C 



ih\r)^{zi\h)'-'i 



U4 



(t/7C(,,|.)E[rr^]Af,|,,)C(r,3|,)L^3)(t/2^C(,,|,)E[/z/z^]C(r,3|,)t/3) 



(C/2TC(,,|,)E[MT]C(r,^|,)C/4) 



= {UjE[zizJ]U3){UjE[z2zJ]U3)-\ujE[z2zJ]Ui). 



2. Suppose r appears between h and Z2. 




By Condition [2] we can choose matrices [/i , C/2 , ^3 , C4 G R'^^'^ such that the columns of Ui are an 
orthonormal basis of range(C(2j^|/i)), the columns of U2 are an orthonormal basis of range(C(22|r))i the 
columns of C/3 are an orthonormal basis of range (C(23|g)), and the columns of t/4 are an orthonormal 
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basis of range(C(2^|g)). We have 

= (L/7C(,,|,)E[MT]Cj3|,)[/3)(C/7C(,,|.)E[rrT]A[,|,)C53|,)t/3) 

(C/7C(,,|.)E[rr^]C(r,^|,)C/4) 
^iUjE[zizJ]U3)iUjE[z2zJ]U3)-\UjE[z2zJ]Ui). 

3. Suppose either r = ft,, or r is between h and 5. 





In either case, by Condition [2] we can choose matrices Ui,U2^U^,Ui g E'*^*^ such that the columns 
of C/i are an orthonormal basis of range(C(2j|/j)), the columns of U2 are an orthonormal basis of 
range(C(22|/i))i the columns of C/3 are an orthonormal basis of range(C(23|g)), ^'^^ ^^® columns of L'4 
are an orthonormal basis of range(C(z^|g)). We have 

UjE[zizJ]Ui 

(C/7q,,|,)E[rrT])-i(t/7C(,,|,)E[rrT])(Cj^|,)f/4) 
= ([/i^C(,,|,)E[rrT]C(r,3|,)t/3)([/7q.,|.)E[rrT]C(r,3|,)C/3)-^ 

(C/7C(,,|,)E[rrT]C(r,^|,)t/4) 
= {UjE[z,zJ]U:i){U^E[z2zJ]U:ir\U^E[z2zJ]U^). 



Therefore, in all cases, 



(Tfc(E[zizJ]) > 



ak{E[z^zJ]) ■ ak{E[z2zJ]) 

CJ,{E[Z2ZJ]) 



U 



The next two lemmas (Lemmas 11 and 12 1 show a dichotomy in the cases that cause the subroutine 
Mergeable return either true or false. 

Lemma 11 (Mergeable pairs). Let TZ C Vt- Let C :— {T[r] : r G 7?.} be a collection of disjoint rooted 
subtrees, with r being the root of T[r], such that their leaf sets {C[r] : r e TZ} partition Vobs- Further, 
suppose the pair {u, v} QTZ are such that one of the following conditions hold. 

1. {u,w} share a common neighbor in T, and both ofT[u\ and T[v\ are leaf components relative to C. 

2. {u,v} are neighbors in T, and at least one ofT[u\ and T[v] is a leaf component relative to C. 
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Then for all pairs {ui,ui} C 7?. \ {u,v} and all {x,y,xi,yi) £ C[u] x C[v] x C[ui] x £[wi]; 
SpectralQuartetTest({x,y, a:i, j/i}) returns {{x,y},{xi,yi}} or _L. This implies that Mergeah\e{TZ, C[-],u,v) 
returns true. 

Remark 2. Note that if \TZ\ < 4, then Mergeable(7?,, £[•], u, w) returns true for all pairs {u,v} C TZ. 

Proof. Suppose the first condition holds, and let h be the common neighbor. Since T[u\ is a leaf component 
relative to C, the (undirected) path from any node u' in T[u] to another node w not in T[u] must pass 
through h. Similarly, the (undirected) path from any node v' in T[v] to another node w not in T[v] must 
pass through h. Therefore, each choice of {ui,vi} C TZ\{u, v} and {x, y, xi,yi) € C[u] x C[v] x C[ui] x C[vi] 
induces one of the following topologies. 




upon which, by Lemma^ the quartet test returns either {{x,y},{xi,yi}} or _L. 

Now instead suppose the second condition holds. Without loss of generality, assume T[u] is a leaf 
component relative to C, which then implies that the (undirected) path from any node u' in T[u\ to another 
node w not in T[u\ must pass through v. Moreover, since T[v] is rooted at v, the (undirected) path from 
any node v' in T[v] to another node w not in T[v] must pass through v. If T[v] is also a leaf component, 
then it must be that TZ = {u, v}, in which case TZ \ {u, v} = 0. If T[v] is not a leaf component, then each 
choice of {ui,vi} C 7?. \ {u,v} and {x,y,xi,yi) e C[u] x C[v] x C[ui] x C[vi] induces one of the following 
topologies. 




upon which, by Lemma^ the quartet test returns either {{x,y},{xi,yi}} or _L. D 

Lemma 12 (Un-mergeable pairs). Let TZ C Vt. Let C := {T[r] : r G TZ} be a collection of disjoint rooted 
subtrees, with r being the root ofT[r], such that their leaf sets {C[r] : r G TZ} partition Vobs- Further, suppose 
the pair {u, v} <ZTZ are such that all of the following conditions hold. 

L There exists {x,y) G C[u] x C[v] such that ak{^x.y) > 0. 

2. {u, v} do not share a common neighbor in T, or at least one of T[u] and T[v\ is not a leaf component 
relative to C. 

3. {u,v} are not neighbors in T, or neither T[u\ nor T[v] is a leaf component relative to C. 

Then there exists a pair {ui,vi} C TZ \ {u,v} and {xi,yi) G C[ui] x C[vi] such that 
SpectralQuartetTest({a;, y, xi, yi}) returns {{x,xi},{y,yi}}. This implies that Mergeable(7?., £[•], u, u) re- 
turns false. 

Proof. First, take {x,y) G C[u] x C[v] such that (Jk{Sx,y) > d. By Lemma^ crk{Sx^y) > (1 — e)9. LemmaN 
implies that the nodes of 5T[C] are C U Vhid [C] , and that each leaf in 5T[C] is a subtree T[u] G C. The second 
and third conditions of the lemma on {u, v} imply that at least one of the following cases holds. 

(i) Neither T[u] nor T[v] is a leaf component relative to C. 

(ii) u and v are not neighbors and do not share a common neighbor. 

(iii) u and v are not neighbors, and one of T[u\ and T[v] is not a leaf component relative to C. 
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Suppose (i) holds. Then each of T[u] and T[v] have degree > 2 in ST[C]. Note that neither u nor v are 
leaves in T. Moreover, there exists {ui, wi} C (TZ \ {u, v}) U Vhid[C] such that ui is adjacent to u in T, vi is 
adjacent to v in T, and the (undirected) path from ui to vi in T intersects the (undirected) path from u to 

V in T. 



- - - - [ V 



Since u is not a leaf, it has at least three neighbors by assumption, and thus there exist three subtrees 
{71,4,7^^2,7^,3} Q J^u such that Ml is the root of Tu,i, x e Vobs[7^,2] and y G Vobs[7i,3]- Moreover, by 
ConditionWJ there exist xi G Vobs[7^,i], X2 G Vobs[7i,2], and X3 G Vohs[Tu,3] such that ak{V.[xixJ]) > 7min 
for all {i, jyc {1, 2, 3}. Note that it is possible to have X2 = x and x^, = y. Let U2 denote the node in 7^,2 
at which the (undirected) paths x -^ u and X2 ~^ u intersect (if X2 = x, then let U2 be the root of 7L,2); 
similarly, let U3 denote the node in 7^,_2 at which the (undirected) paths y ~^ u and X3 -^ u intersect (if 
X3 = y, then let U3 be the root of 7^,3). The induced (undirected) topology over these nodes is shown below. 




A completely analogous argument can be applied relative to v instead of u, giving the following 

V 




Claim 1. The following lower bounds hold. 



mm{ak{Sxi.x), CFkiSxi,y), CrkiSyi,y), CFkiSy^^x)} > ^^^ = 'T- 



IS) 



Proof. We just show the inequalities for ak{K[xix^]) and iTfe(E[xiy^]); the other two are analogous. If 
X2 = X, then crj.(E[2;ia::^]) — ak(E[xixJ]) > 7min ^ 'i- If X2 7^ x, then we have the following induced 
(undirected) topology. 




Therefore, by Lemma [TOl 

/nrr Tin ^ crfe(E[a;ia;J]) • ak{E[yx'^]) 7min • (1 - £)& 
ak(E[xiX ) > — - — YT\ - = '^■ 

(TimyxJ]) 7max 

This gives the first claimed inequality; now we show the second. If X3 — y, then crfc(E[a;i2/^] 
7min > ?• If 2^3 7^ J/, then we have the following induced (undirected) topology. 



= ak{E[xixJ]) > 
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Again, by Lemma 10 



ak{E[xiy'^]) > 



ak{E[xixJ]) ■ (Tk{E[xy'^]) 7min • (1 - e 



> 



cri(E[a;x;[]) 7max 



a 



Claim n] Lemma El and the sample size requirement of Theorem [I] (as per ([7])) imply that the spectral 
quartet test on {x,xi,y,yi} returns the correct pairing. Since the induced (undirected) topology is 




the correct pairing is {{x,xi},{y,yi}}. Because the leaf sets {C[r] : r e 7?.} partition Vobs, and because 
xi ^ £[u] and j/i ^ C[v], there exists {u',v'} C 7?, \ {u,v} such that xi G C[u'] and j/i G C[v']. This proves 
the lemma in this case. 

Now instead suppose (ii) holds. Since T is connected, and T[u] and T[v] are respectively rooted at u and 
V, there must exist a pair {ui, ui} C (TZ \ {u, v}) U Vhid[C] such that neither ui nor vi are leaves in T, ui is 
adjacent to u in T, vi is adjacent to v in T, and the (undirected) path from u to f in T passes through the 
path from ui to vi. 



An argument analogous to that in case (i) applies to prove the lemma in this case; we provide a brief sketch 
below. Because ui is not a leaf, there exists three subtrees {7Li,i,7^i,2,7Li,3} C J^^^ such that u is the root 
of Tui,2 (so X € Vobs[71i,2]) and y € Vohs[Tui,3]- Moreover, there exist xi € Vobs[7^i,i], X2 G Vobs[7^i,2], and 
3^3 G Vobs[7^ii-3] such that ak{E[xixJ]) > 7inin for all {i,j} C {1,2,3} (it is possible to have X2 = x and 
X3 = y). Let u\ denote the root oiTm^i, u'2 denote the node in 7^i,2 at which the (undirected) paths x -^ Ui 
and X2 ~^ ui intersect (if X2 = x, then let u'2 = u, which is the root of 7^1,2), and M3 denote the node in 
%ii,2 at which the (undirected) paths y -^ ui and X3 -^ ui intersect (if x^ — y, then let M3 be the root of 
Tui^s)- An analogous argument applies relative to vi instead of mi; the induced (undirected) topologies are 
given below. 





Using the arguments in Claimlll it can be shown that the inequalities in (|9| hold in this case, so by Lemmal4J 
the quartet test on {x,xi,y,yi} returns {{x,xi},{y,yi}}. Because the leaf sets {C[r] : r S TZ\ partition 
Vobs, and because xi ^ C[u] — Vobs[7I,i^2] and yi ^ C[v\ = Vobs[7^i,2], there exists {u',w'} C 7?. \ {u,v} such 
that xi € >C[m'] and j/i € 'C[w']. This proves the lemma in this case. 

Finally, suppose (iii) holds. Without loss of generality, assume T[u] is not a leaf component relative 
to C. Since T is connected, and T[u\ and T[v] are respectively rooted at u and v, there must exist vi e 
{TZ\ {u, v}) U Vhid[C] such that vi is not a leaf in T, vi is adjacent to v in T, and the (undirected) path from 
M to ti in T passes through vi. Moreover, since T[u] is not a leaf component relative to C, it has degree > 2 
in ST[C]. Note that u is not a leaf in T, and moreover, there exists ui E (TZ\ {m, v}) U Vhid[C] such that ui 
is adjacent to u in T, and ui is not on the (undirected) path from u to v. 



a 



Again, an argument analogous to that in case (i) applies now to prove the lemma in this case. 
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Finally, we give a lemma which analyzes the while- loop of Algorithm [2] and consequently implies Theo- 
rem^ 

Lemma 13 (Loop invariants). The following invariants concerning the state of the objects {TZ,T[-], £[■]) 
hold before the while-loop in Algorithm^ and after each iteration of the while-loop. 

1. TZ C_ Vt, and for each u € TZ, T[u\ is a subtree off rooted at u. Moreover, the rooted subtree T[v] is 
already defined by Algorithmic for every node v appearing in T[u] for some u e 7?.. Finally, for each 
u £ TZ, the subtree T[u\ is formed by joining the subtrees T[v] corresponding to children v of u in T[u\ 
via edges {u, v}. 

2. The subtrees in C :— {T[u\ : u G TV\ are disjoint, and the leaf sets {^[u] : u G TZ] partition Vobs- 

Moreover, no iteration of the while-loop terminates in failure. 

Before proving Lemma 13 we show how it implies Theorem nl Initially, \TZ\ = n, and each iteration of 
the while-loop decreases the cardinality of TZ by one, so there are a total of n— 1 iterations of the while-loop. 
By Lemma 13 the final iteration results in a set 7?. = {ft-} such that T = T[h] is a subtree of T rooted at h, 
and C[h] = Vobs- This implies that T has the same (undirected) structure as T, as required. This completes 
the proof of Theorem [I] 

Proof of Lemma \T^ The loop invariants clearly hold before the while- loop with the initial settings oiTZ — 
Vobs J T[x\ = rooted single-node tree x, and C[x\ — {x} for all x € TZ. So assume as the inductive hypothesis 
that the loop invariants hold at the start of a particular iteration (in which \TZ\ > 1). It remains to prove 
that the iteration does not terminate in failure, and that the loop invariants hold at the end of the iteration. 
Let TZ, T[-], and £[•] be in their state at the beginning of the iteration. 

Because the second loop invariant holds. Lemma l9] implies that the nodes of 5T[C] are C U Vhid[C], and 
that each leaf in 5T[C] is a subtree T[u] € C (so we may refer to the leaves of 5T[C] as leaf components). 

Claim 2. // \TZ\ > 1, then there exists a pair {u,v} C TZ such that the following hold. 

1. Either u and v are neighbors in T, and at least one ofT[u] or T[v] is a leaf component relative to C; 
or u and v share a common neighbor in VhidP], and both T[u] and T[v] are leaf components relative 
toC. 

2. Merge-ab\e{TZ,C[-],u,v) — true. 

3. rtiB.yi{(Jk{%,y) : {x,y) G C[u] x C[v]} > 9. 

Proof. Suppose there are no pairs {u, w} C C such that u and v are neighbors in T and at least one of T[u\ 
and T[v] is a leaf component relative to C. Then each leaf component must be adjacent to some h e Vhid[C] 
in ST[C]. Consider the tree ST' obtained from 5T[C] by removing all the leaf components in 5T[C]. The 
leaves of ST' must be among the h e Vhid[C] that were adjacent to the leaf components in ST\C]. Fix such 
a leaf h in ST' , and observe that it has degree one in ST' ■ By assumption, no node in T has degree two, 
so h must have been connected to at least two leaf components in ST\C\ , say T[u\ and T[v\ . The node h is 
therefore a common neighbor of u and v. This proves the existence of a pair {u,v} C TZ satisfying the first 
required property. 



Fix the pair {u, v} specified above. By Lemma 11 Mergeable(7?., £[•], u, v) returns true, so {u, v} satisfies 
the second required property. 

To show the final required property, we consider two cases. Suppose first that u and v are neighbors, 
and that T[u] is a leaf component relative to C. Note that u and v cannot both be leaves in T. If w is 
not a leaf, then there exists subtrees Tv,i and Tv,2 in Ty such that Tv,i = T[u\ (because T[u\ is a leaf 
component) and Tv.2 = T[v'] for some child v' of v in T[v] (by the first loop invariant). By Conditional 
there exists x £ Vohs[TvA] = ^u] and y € Vohs[Tv,2] Q C[v] such that ak{Sx,y) > 7min = (1 + e)^; by 



Lemma 8l crk{Sx,y) > ^- If f is a leaf but u is not, then there exists subtrees Tu,i and 71,^2 in Tu such that 
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Tu.i = V and 7^.2 = Tlu'] for some child u' of u in T[u] (by the first loop invariant). So by ConditionWl there 
y G Vobs[7^,2] C £[u] such that (Tk{Sv,y) > 7min = (I+e)^; by Lemmapl (Tfc(Z'„^j,) > 6. Now instead suppose 
that u and w share a common neighbor h, and that both T[u] and 7^^^] are leaf components relative to C. 
This latter fact implies that {T[u],T[w]} C J^h, so Conditionl4]implies that there exists x e Vobs[T[u]] ~ C[u] 
and y e Vobs[TH] = £M such that akiSx,y) > 7min = (1 + £)&■ By Lemma^ o-fc(i?a;,y) > 6*. D 



Claim 3. Consider any pair {u,v} C TZ such that majc{ak{'Sx,y) '■ {x,y) G C[u\ x £[!>]} > 9. If the first 
property from Claiml^fails to hold for {u,v}, then Mergeable(7?., £[•], u, w) —false. 

Proof. This follows immediately from Lemma [12] D 

Taken together, Claims [2] and Is] imply that the pair {u, z;} C 7?, selected by the first step in the while- loop 
indeed exists (so the iteration does not terminate in failure) and satisfies the properties in Claim [2] 
Now we consider the second step of the while-loop, which is the call to the subroutine Relationship. 

Claim 4. Suppose a pair {u,v} satisfies the properties in Claim [^ Then Relationship(7?.,£[-],T[-],u, ?;) 
returns the correct relationship for u and v. Specifically: 

1. If u and V share a common neighbor in T (and both are leaf components relative to C), then "siblings" 
is returned. 

2. If u and v are neighbors in T and T[v] is a leaf component relative to C but T[u\ is not, then "u is 
parent of v " is returned. 

3. If u and v are neighbors in T and T[u\ is a leaf component relative to C but T[v] is not, then "v is 
parent of u " is returned. 

4-. If u and V are neighbors in T and both T[u] and T[v] are leaf components relative to C, and u is a leaf 
in T but V is not, then "v is parent of u " is returned. 

5. If u and v are neighbors in T and both T[u\ and T[v] are leaf components relative to C, and v is a leaf 
in T but u is not, then "u is parent of v " is returned. 

6. If u and v are neighbors in T and both T[u\ and T[v] are leaf components relative to C, and neither u 
nor V are leaves in T, then "u is parent of v " is returned. 

Proof. Fix the pair {x, y) € C[v\ x L[v] guaranteed by the third property of Claimp^such that ak{Sx.y) > 0. 
Now we consider the possible relationships between u and v. 

Suppose u and v share a common neighbor h e Vhidp] in T, and that both 7"[m] and 7"[t'] are leaf 
components relative to C. We need to show that the subroutine Relationship asserts both "u ~/^ i;" and 
"t) 7^ m" . To show that "u ~/^ v" is asserted, we assume u is not a leaf (otherwise "w y^ w" is immediately 
asserted and we're done), let {ui, . . . , Uq} be the children of u in T[u], and take TZ[u] as defined in Relationship. 
By the first loop invariant, the subtrees in C[u\ are disjoint, and the leaf sets {C[r] : r S 7?.[w]} partition 
Vobs- In particular, x S C[ui] for some i e {1, . . . ,q}. Since Ui and v are not neighbors, and do not share 
a common neighbor. Therefore, by Lemma [T2| Mergeab\e{TZ[u],£[-],Ui,v) = false, so "m -/^ u" is asserted. 
A similar argument implies that "w -/^ m" is asserted. Since both "w -/^ w" and "u -/^ u" are asserted, the 
subroutine returns "siblings" . 

Now instead suppose u and v are neighbors. First, suppose T[u\ is a leaf component relative to C. We 
claim that if v is not a leaf, then "v -/^ u" is not asserted. Let {vi , . . . ,Vq}he the children of v in T[v] , and take 
TZ[v] = {u, vi, . . . , Vq} as defined in Relationship. By the first loop invariant, the subtrees in C[v] are disjoint. 



and the leaf sets {C[r] : r G 7^[w]} partition Vobs- By Lemma 14 T[u] and T[vi] are leaf components relative 



to C[v] for each i G {1, . . . , q}. For each i g {1, . . . , <?}, {u, Vi} share w as a common neighbor, and T[u\ and 



T[vi] are both leaf components relative to C[v]. Therefore by Lemma 11 Mergeab\e{TZ[v], £[■], u, Vi) — true 
for alH e {1, . . . , q}, so "w -/^ u" is not asserted. 
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Suppose T[u] is a leaf component relative to C but T[v] is not. By Lemma l9J v is not a leaf in T, so as 
argued above, "w 7^ u" is not asserted. It remains to show that "u 7^ v" is asserted. Assume u is not a leaf 
(or else u -/> v is immediately asserted and we're done), let {ui, . . . ,«,} be the children of u in T[u], and 
take TZ[u] as defined in Relationship. By the first loop invariant, the subtrees in C[u] are disjoint, and the leaf 



sets {C[r] : r £ 7?.[u]} partition Vobs- In particular, x G C[ui] for some i € {1, . . . ,g}. By Lemma 14 T[v] 
is not a leaf component relative to C[u]. Moreover, Ui and v are not neighbors. Therefore by Lemma |12[ 
Mergeah\e{TZ[u],C[-],Ui,v) = false, so "w t^- u" is asserted. Since "w y^ u" is not asserted but "m 7^ u" is 
asserted, the subroutine returns "w — > u" . An analogous argument shows that if T[v] is a leaf component 
relative to C but T[u] is not, then the subroutine returns "u — )■ w". 

Now suppose both T[u] and T[i'] are leaf components relative to C. By assumption, leaves in T are only 
adjacent to non-leaves, so it cannot be that both u and v are leaves. Therefore at least one of u and v is 
not a leaf in T. Without loss of generality, say v is not a leaf in T. Then as argued above, "w 7^ u" is not 
asserted. If u is a leaf, then "u 7^ v" is asserted, so the subroutine returns "w — ?► m" . If w is not a leaf, then 
by symmetry, "u -/^ v" is not asserted. Therefore the subroutine returns "u — > w" . D 

Claim |4] implies that the remaining steps in the while-loop after the call to Relationship preserve the two 
loop invariants, simply by construction. D 

There is one last lemma used in the proof of Lemma |13| 



Lemma 14 (Leaf components). Suppose the invariants in Lemma 13 are satisfied. Then for each u £ TZ 



such that u is not a leaf in T, the leaf components relative to the collection 

C[u] := (C \ {T[w]}) U {T[v] : V is a child of u in T[u]} 

are 

[Tlr] : r 7^ u A T[r] is a leaf component relative to C} U {T[r] : r is a child of u in T[u\}. 

Proof. Pick any u G TZ such that u is not a leaf in T. Let {vi, . . . ,Vq} be the children of u in T[u\. By the 
first loop invariant, each Vi is the root of a subtree T[wi]. This implies that the subtrees {T[i'i], . . . , 7~[wg]} 
are disjoint and {>C[ui], . . . ,>C[t;g]} partition C[u\. Therefore 5T[C[7i]] is the same as iST[C] except with the 
following changes. 

1. T[u\ is replaced with u. 

2. For each i, T[vi] is added with the edge {u,Vi}. 

This means that each T[vi] has degree one in 5T[C[u]] and therefore is a leaf component relative to C[u]. D 
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